Discriminative Topic Mining for Social Spam Detection

نویسندگان

  • Long Song
  • Raymond Y. K. Lau
  • ChunXiao Yin
چکیده

In the era of Social Web, there has been an explosive growth of user-contributed comments posted to various online social media. However, increasingly more misleading and deceptive user comments found at online social media have also been a great concern for consumers and merchants, and social spam have been brought to the attention by the legal circle in recent years. Social spam can cause tremendous loss to both consumers and merchants, and so there is a pressing need to design effective methodologies to detect social spam to maintain the hygiene of online social media. The main contribution of this paper is the illustration of a novel social spam detection methodology which combines word-, topic-, and user-based features to combat social spam. In particular, the proposed methodology is underpinned by the Labeled Latent Dirichlet Allocation (L-LDA) model, a kind of probabilistic generative model. A series of experiments conducted based on the social comments posted to YouTube show that our proposed methodology can achieve a detection accuracy of 91.17%. The business implication of our research is that merchants can apply our methodology to filter spam so as to extract accurate market intelligence from online social media. Moreover, social media site owners can leverage the proposed methodology to maintain the hygiene of their sites.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter

Social spam produces a great amount of noise on social media services such as Twitter, which reduces the signal-tonoise ratio that both end users and data mining applications observe. Existing techniques on social spam detection have focused primarily on the identification of spam accounts by using extensive historical and network-based data. In this paper we focus on the detection of spam twee...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

Detecting Pharmaceutical Spam in Microblog Messages

Microblogs are one of a growing group of social network tools. Twitter is, at present, one of the most popular forums for microblogging in online social networks, and the fastest growing. Fifty million messages flow through servers, computers, and cell phones on a wide variety of topics exchanged daily. With this considerable volume, Twitter is a natural and obvious target for spreading spam vi...

متن کامل

SocialSpamGuard: A Data Mining-Based Spam Detection System for Social Media Networks

We have entered the era of social media networks represented by Facebook, Twitter, YouTube and Flickr. Internet users now spend more time on social networks than search engines. Business entities or public figures set up social networking pages to enhance direct interactions with online users. Social media systems heavily depend on users for content contribution and sharing. Information is spre...

متن کامل

Learning to Represent Review with Tensor Decomposition for Spam Detection

Review spam detection is a key task in opinion mining. To accomplish this type of detection, previous work has focused mainly on effectively representing fake and non-fake reviews with discriminative features, which are discovered or elaborately designed by experts or developers. This paper proposes a novel review spam detection method that learns the representation of reviews automatically ins...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014